6 research outputs found
SQ Lower Bounds for Learning Bounded Covariance GMMs
We study the complexity of learning mixtures of separated Gaussians with
common unknown bounded covariance matrix. Specifically, we focus on learning
Gaussian mixture models (GMMs) on of the form , where and for some . Known learning
algorithms for this family of GMMs have complexity . In
this work, we prove that any Statistical Query (SQ) algorithm for this problem
requires complexity at least . In the special case
where the separation is on the order of , we additionally obtain
fine-grained SQ lower bounds with the correct exponent. Our SQ lower bounds
imply similar lower bounds for low-degree polynomial tests. Conceptually, our
results provide evidence that known algorithms for this problem are nearly best
possible
Estimating the Number of Induced Subgraphs from Incomplete Data and Neighborhood Queries
We consider a natural setting where network parameters are estimated from noisy and incomplete information about the network. More specifically, we investigate how we can efficiently estimate the number of small subgraphs (e.g., edges, triangles, etc.) based on full access to one or two noisy and incomplete samples of a large underlying network and on few queries revealing the neighborhood of carefully selected vertices. After specifying a random generator which removes edges from the underlying graph, we present estimators with strong provable performance guarantees, which exploit information from the noisy network samples and query a constant number of the most important vertices for the estimation. Our experimental evaluation shows that, in practice, a single noisy network sample and a couple of hundreds neighborhood queries suffice for accurately estimating the number of triangles in networks with millions of vertices and edges
Streaming Algorithms for High-Dimensional Robust Statistics
We study high-dimensional robust statistics tasks in the streaming model. A
recent line of work obtained computationally efficient algorithms for a range
of high-dimensional robust estimation tasks. Unfortunately, all previous
algorithms require storing the entire dataset, incurring memory at least
quadratic in the dimension. In this work, we develop the first efficient
streaming algorithms for high-dimensional robust statistics with near-optimal
memory requirements (up to logarithmic factors). Our main result is for the
task of high-dimensional robust mean estimation in (a strengthening of) Huber's
contamination model. We give an efficient single-pass streaming algorithm for
this task with near-optimal error guarantees and space complexity nearly-linear
in the dimension. As a corollary, we obtain streaming algorithms with
near-optimal space complexity for several more complex tasks, including robust
covariance estimation, robust regression, and more generally robust stochastic
optimization
Robust Sparse Mean Estimation via Sum of Squares
We study the problem of high-dimensional sparse mean estimation in the
presence of an -fraction of adversarial outliers. Prior work obtained
sample and computationally efficient algorithms for this task for
identity-covariance subgaussian distributions. In this work, we develop the
first efficient algorithms for robust sparse mean estimation without a priori
knowledge of the covariance. For distributions on with
"certifiably bounded" -th moments and sufficiently light tails, our
algorithm achieves error of with sample complexity . For the special case of the Gaussian
distribution, our algorithm achieves near-optimal error of
with sample complexity . Our
algorithms follow the Sum-of-Squares based, proofs to algorithms approach. We
complement our upper bounds with Statistical Query and low-degree polynomial
testing lower bounds, providing evidence that the sample-time-error tradeoffs
achieved by our algorithms are qualitatively the best possible.Comment: To appear in COLT 202
List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering
We study the problem of list-decodable sparse mean estimation. Specifically,
for a parameter , we are given points in
, of which are i.i.d. samples from a
distribution with unknown -sparse mean . No assumptions are made on
the remaining points, which form the majority of the dataset. The goal is to
return a small list of candidates containing a vector such that
is small. Prior work had studied the problem of
list-decodable mean estimation in the dense setting. In this work, we develop a
novel, conceptually simpler technique for list-decodable mean estimation. As
the main application of our approach, we provide the first sample and
computationally efficient algorithm for list-decodable sparse mean estimation.
In particular, for distributions with ``certifiably bounded'' -th moments in
-sparse directions and sufficiently light tails, our algorithm achieves
error of with sample complexity and running time . For the
special case of Gaussian inliers, our algorithm achieves the optimal error
guarantee of with quasi-polynomial sample and
computational complexity. We complement our upper bounds with nearly-matching
statistical query and low-degree polynomial testing lower bounds